feat: unify llm-judge and agent-judge, add agentv provider by christso · Pull Request #617 · EntityProcess/agentv

christso · 2026-03-15T13:48:02Z

Summary

Closes #614

Unified judge types: Absorbed agent-judge into llm-judge with auto-detection. llm-judge now supports three modes:
- LLM mode (default): Structured JSON evaluation via generateObject
- Built-in agent mode: When judge provider is agentv, uses AI SDK generateText with sandboxed filesystem tools
- Delegate mode: When judge provider is an agent provider (claude-cli, codex, etc.), sends evaluation prompt via provider.invoke()
New agentv provider: Built-in AI SDK provider that parses provider:model strings (e.g., openai:gpt-5-mini) and creates LanguageModel instances via direct SDK calls. Supports openai, anthropic, azure, google.
CLI flags: Added --judge-target and --model flags to agentv eval for overriding judge provider across all evaluators
Hard removal: Removed agent-judge entirely — no backward compat, no YAML remapping, as if it never existed. Only llm-judge remains.
Code review fixes: Replaced cliModel! non-null assertion with explicit guard, consolidated duplicate delegate methods, added try-catch for RegExp in search_files tool
CLAUDE.md: Added "Completing Work — E2E Checklist" section requiring e2e verification for all work before finishing

Key changes

File	Change
`providers/agentv-provider.ts`	New — parses model strings, creates AI SDK LanguageModel
`evaluators/llm-judge.ts`	Major — absorbed agent-judge logic, three evaluation modes
`evaluators/agent-judge.ts`	Deleted
`types.ts`	Removed `AgentJudgeEvaluatorConfig`, added `max_steps`/`temperature` to `LlmJudgeEvaluatorConfig`
`registry/builtin-evaluators.ts`	Removed `agentJudgeFactory`, updated `llmJudgeFactory`
`loaders/evaluator-parser.ts`	Removed agent-judge backward compat
`loaders/eval-yaml-transpiler.ts`	Unified NL conversion, only llm-judge cases
`validation/eval-file.schema.ts`	Removed `AgentJudgeSchema` entirely
`orchestrator.ts`	`resolveJudgeProvider` handles `--judge-target` override
`apps/cli/.../run.ts`	`--judge-target` and `--model` CLI flags
`examples/features/agent-judge/`	Deleted entirely
`CLAUDE.md`	Added E2E checklist section

Test plan

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace createProviderRegistry with direct createOpenAI/createAnthropic/ createAzure/createGoogleGenerativeAI calls to resolve v2/v3 spec version type compatibility issues. Parse "provider:model" strings manually via a switch statement. Simplify test mocks and add coverage for google, azure, and error cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove agent-judge as a separate evaluator type. LlmJudgeEvaluator now auto-detects mode based on the resolved judge provider: - LLM providers (azure, anthropic, gemini): structured JSON mode - Agent providers (claude-cli, copilot, etc.): delegate mode - agentv provider: built-in AI SDK agent mode with filesystem tools Closes #614

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The transpiler now handles llm-judge with rubrics the same way as agent-judge, expanding rubric items into individual NL assertion strings. Part of #614

- Add explicit guard for --model when --judge-target is agentv (was non-null assertion) - Consolidate evaluateWithJudgeTarget/evaluateWithDelegatedAgent into shared evaluateWithDelegate - Add try-catch for RegExp construction in search_files tool (prevents crash on invalid patterns) - Add comments explaining agentv exclusion from AGENT_PROVIDER_KINDS and AgentJudgeSchema backward compat Part of #614

cloudflare-workers-and-pages · 2026-03-15T13:48:27Z

Deploying agentv with Cloudflare Pages

Latest commit:	`3c9ed00`
Status:	✅ Deploy successful!
Preview URL:	https://de46f704.agentv.pages.dev
Branch Preview URL:	https://feat-unify-judge-types-614.agentv.pages.dev

View logs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso and others added 9 commits March 15, 2026 12:39

feat: add agentv to ProviderKind

2cb3a00

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add agentv provider to target resolution

80a20c1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add agentv provider implementation

64d8d6d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: cast openai v3 model to LanguageModel, fix test assertions

d58b34c

feat: add --judge-target and --model CLI flags with orchestrator wiring

bd576b9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: unify llm-judge/agent-judge in transpiler NL conversion

a11b2ab

The transpiler now handles llm-judge with rubrics the same way as agent-judge, expanding rubric items into individual NL assertion strings. Part of #614

christso and others added 4 commits March 15, 2026 14:06

refactor: remove all agent-judge references from codebase

0f3c6af

docs: add E2E checklist to CLAUDE.md for all work before finishing

3171eb9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix biome formatting

effb331

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: remove accidentally committed node_modules symlinks

3c9ed00

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso merged commit 228a619 into main Mar 15, 2026
1 check passed

christso deleted the feat/unify-judge-types-614 branch March 15, 2026 19:40

christso mentioned this pull request Mar 15, 2026

refactor: rename judge → grader across codebase #618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unify llm-judge and agent-judge, add agentv provider#617

feat: unify llm-judge and agent-judge, add agentv provider#617
christso merged 13 commits intomainfrom
feat/unify-judge-types-614

christso commented Mar 15, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Mar 15, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 15, 2026 •

edited

Loading